Let's start by loading some data:
mydata <- read.csv("https://wwwedu.github.io/NYU_F22/Lab02/introdata.csv")
# Can also use '=' as assignment operator
# mydata = read.csv("https://wwwedu.github.io/NYU_F22/Lab02/introdata.csv")
To view the data, we can use the environment viewer in RStudio or any of the following:
# try different options by uncommenting the line
# print(mydata)
# mydata
head(mydata)
# tail(mydata)
| x | y | |
|---|---|---|
| <dbl> | <dbl> | |
| 1 | 55.3846 | 97.1795 |
| 2 | 51.5385 | 96.0256 |
| 3 | 46.1538 | 94.4872 |
| 4 | 42.8205 | 91.4103 |
| 5 | 40.7692 | 88.3333 |
| 6 | 38.7179 | 84.8718 |
To extract a variable from the data frame (the x variable for example):
# mydata$x
# print(mydata$x)
head(mydata$x)
To assign to a new variable:
x <- mydata$x
y <- mydata$y
# my_variable <- mydata$x
# my.variable <- mydata$x
# MyVariable <- mydata$x
head(x)
For basic statistics:
mean(x)
sd(x)
cor(x,y)
To apply the same function to each vector in the dataframe:
The 'apply' functions are a very useful feature of R (a good explanation here):
AllTheMeans <- sapply(mydata,mean)
AllTheMeans
Or use prepackaged functions:
summary(mydata)
x y Min. :22.31 Min. : 2.949 1st Qu.:44.10 1st Qu.:25.288 Median :53.33 Median :46.026 Mean :54.26 Mean :47.832 3rd Qu.:64.74 3rd Qu.:68.526 Max. :98.21 Max. :99.487
Note that we have access to the output of a function just like any other object. So we can extract just the output we need:
my.table <- summary(mydata)
paste0("Vector X ",my.table[4,1])
paste0("Vector Y ",my.table[4,2])
Basic plot:
plot(x, y)
# plot(mydata) # Since only two variables can just use this